Skip to content

FEAT Add ToxicChat dataset loader#1422

Open
romanlutz wants to merge 9 commits intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset
Open

FEAT Add ToxicChat dataset loader#1422
romanlutz wants to merge 9 commits intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset

Conversation

@romanlutz
Copy link
Contributor

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Copilot AI review requested due to automatic review settings March 1, 2026 14:20
@romanlutz romanlutz changed the title Add ToxicChat dataset loader FEAT Add ToxicChat dataset loader Mar 1, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed dataset loader for the HuggingFace lmsys/toxic-chat dataset so it can be discovered/loaded through PyRIT’s dataset provider framework, with accompanying unit tests and docs update.

Changes:

  • Introduces _ToxicChatDataset remote loader that converts ToxicChat rows into SeedPrompts and skips prompts that fail template parsing.
  • Adds unit tests for the ToxicChat loader.
  • Updates dataset-loading documentation output to include toxic_chat in the available dataset list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py New HuggingFace-backed loader that builds SeedDataset/SeedPrompt entries from ToxicChat rows and skips problematic entries.
tests/unit/datasets/test_toxic_chat_dataset.py New unit tests validating basic loading behavior and custom HF config plumbing.
pyrit/datasets/seed_datasets/remote/__init__.py Exposes/imports the new loader for discovery/registration.
doc/code/datasets/1_loading_datasets.ipynb Updates the displayed list of available datasets to include toxic_chat.
Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

  • For consistency with other dataset loader tests (e.g., DarkBench), this test should assert that split (and optionally cache) are forwarded to _fetch_from_huggingface in addition to dataset_name and config; otherwise regressions in argument plumbing won't be caught.
        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

@romanlutz romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch 2 times, most recently from cb93f7f to c63374c Compare March 1, 2026 14:54
Copilot AI review requested due to automatic review settings March 1, 2026 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

  • This test configures split="test" but doesn't assert that fetch_dataset() passes split through to _fetch_from_huggingface (unlike similar tests for other datasets). Adding an assertion for call_kwargs["split"] (and optionally cache) would improve coverage of the custom-config behavior.
        loader = _ToxicChatDataset(
            dataset_name="custom/toxic-chat",
            config="custom_config",
            split="test",
        )

        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real
user-chatbot conversations from Chatbot Arena annotated for toxicity and
jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch from c63374c to 79e2bf9 Compare March 2, 2026 12:58
The HF dataset identifier is now a class constant HF_DATASET_NAME
instead of a constructor parameter, consistent with other loaders.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 13:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

doc/code/datasets/1_loading_datasets.ipynb:242

  • The notebook metadata language_info.version changed from 3.11.14 to 3.13.5, which is typically just an artifact of the author’s local environment. To reduce churn in docs, consider keeping this stable (or stripping environment-specific metadata) when updating notebooks.
   "version": "3.13.5"

romanlutz and others added 2 commits March 2, 2026 05:55
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 14:02
Raw wrapping preserves original dataset text including Jinja2-like
syntax (HTML, template tags), eliminating the need to catch and skip
TemplateSyntaxError entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:112

  • Remote dataset loaders in this repo typically wrap prompt text in Jinja2 {% raw %}...{% endraw %} before constructing a SeedPrompt so that SeedPrompt.__post_init__ doesn't render/alter prompt content and so template-like text doesn't trigger parsing errors. Here value=item["user_input"] plus catching TemplateSyntaxError means some prompts will be silently dropped and others may still be mutated by Jinja rendering. Consider wrapping user_input in raw tags (and only skipping if it still fails), and include the conv_id in the debug log for traceability.
                description=description,
                source=source_url,
                authors=authors,
                groups=groups,
                metadata={
                    "toxicity": str(item.get("toxicity", "")),
                    "jailbreaking": str(item.get("jailbreaking", "")),
                    "human_annotation": str(item.get("human_annotation", "")),
                },
            )
            for item in data
        ]

        logger.info(f"Successfully loaded {len(seed_prompts)} prompts from ToxicChat dataset")

        return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)

romanlutz and others added 2 commits March 2, 2026 06:49
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 2, 2026 15:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +58 to +61
async def test_fetch_dataset_preserves_jinja2_content(self):
"""Test that entries with Jinja2-like content are preserved via raw wrapping."""
data_with_html = [
{
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don’t exercise the advertised behavior of skipping truly Jinja2-incompatible inputs (e.g., user_input containing {% endraw %} that can break parsing even when wrapped). After adding skip/exception handling in the loader, add a unit test asserting such an entry is skipped while the rest of the dataset still loads.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +95
seed_prompts = [
SeedPrompt(
value=f"{{% raw %}}{item['user_input']}{{% endraw %}}",
data_type="text",
dataset_name=self.dataset_name,
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_dataset builds SeedPrompt objects in a list comprehension, but SeedPrompt.__post_init__ can raise when Jinja2 cannot parse the template (e.g., if user_input contains {% endraw %} or otherwise breaks the outer {% raw %}...{% endraw %} wrapper). In that case, the entire dataset load will fail rather than “gracefully skipping” bad entries as described. Consider switching to an explicit loop that wraps SeedPrompt(...) construction in a try/except (catch ValueError / Jinja2 TemplateSyntaxError), logs a warning with conv_id, and continues.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants